A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of DataFrame
A pandas DataFrame can be created using the following constructor :
Create DataFrame: A pandas DataFrame can be created using various inputs like :
import pandas as pd
df=pd.DataFrame()
print(df)
Empty DataFrame Columns: [] Index: []
The DataFrame can be created using a single list or a list of lists.
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.Series(data)
Ser
0 1 1 2 2 5 3 4 4 6 dtype: int64
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.DataFrame(data)
Ser
| 0 | |
|---|---|
| 0 | 1 |
| 1 | 2 |
| 2 | 5 |
| 3 | 4 |
| 4 | 6 |
Note: As you can seen in Series there is no column name but in DataFrame there is a default column starting from Zero(0)
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
df=pd.DataFrame(data)
df
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.50 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56.00 |
| 4 | Keisha | 23 | 97.00 |
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
| Name | Age | Marks | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.50 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56.00 |
| 4 | Keisha | 23 | 97.00 |
data = [('Robin',26,45.34),('Karan',25,78.5),('Priya',23,87.67),('Varun',22,56),('Keisha',23,97)]
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
| Name | Age | Marks | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.50 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56.00 |
| 4 | Keisha | 23 | 97.00 |
All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.
import pandas as pd
data = {'Name':['Ayush', 'Priya', 'Kapil', 'Rohit'],'Age':[28,21,29,42]}
df = pd.DataFrame(data)
df
| Name | Age | |
|---|---|---|
| 0 | Ayush | 28 |
| 1 | Priya | 21 |
| 2 | Kapil | 29 |
| 3 | Rohit | 42 |
df = pd.DataFrame(data, index=['i1','i2','i3','i4'])
print(df)
Name Age i1 Ayush 28 i2 Priya 21 i3 Kapil 29 i4 Rohit 42
List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.
import pandas as pd
data = [{'a': 12, 'b': 32},{'a': 15, 'b': 50, 'c': 23},{'a': 65, 'b': 45, 'c': 19}]
df = pd.DataFrame(data)
df
| a | b | c | |
|---|---|---|---|
| 0 | 12 | 32 | NaN |
| 1 | 15 | 50 | 23.0 |
| 2 | 65 | 45 | 19.0 |
df = pd.DataFrame(data, index=['First', 'Second','Third'])
df
| a | b | c | |
|---|---|---|---|
| First | 12 | 32 | NaN |
| Second | 15 | 50 | 23.0 |
| Third | 65 | 45 | 19.0 |
df1 = pd.DataFrame(data, index=['First', 'Second','Third'], columns=['a', 'b'])
df1
| a | b | |
|---|---|---|
| First | 12 | 32 |
| Second | 15 | 50 |
| Third | 65 | 45 |
A DataFrame can be created by passing a Dictionary of Series. The union of all the series indexes passed, is the resultant index.
import pandas as pd
data={'Col1': pd.Series([1,5,2,5,6],index=['a','b','c','d','e']), 'Col2': pd.Series([25,87,52,65,89],index=['a','b','c','d','e']) }
df=pd.DataFrame(data)
df
| Col1 | Col2 | |
|---|---|---|
| a | 1 | 25 |
| b | 5 | 87 |
| c | 2 | 52 |
| d | 5 | 65 |
| e | 6 | 89 |
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
import numpy as np
Arr = np.array(data)
Arr
array([['Robin', '26', '45.34'],
['Karan', '25', '78.5'],
['Priya', '23', '87.67'],
['Varun', '22', '56'],
['Keisha', '23', '97']], dtype='<U32')
df = pd.DataFrame(Arr)
df
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.5 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56 |
| 4 | Keisha | 23 | 97 |
df = pd.DataFrame(Arr,columns = ['Name','Age','Marks'])
df
| Name | Age | Marks | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.5 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56 |
| 4 | Keisha | 23 | 97 |
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
df = pd.DataFrame(data,columns = ['Name','Age','Marks1'])
df
| Name | Age | Marks1 | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.50 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56.00 |
| 4 | Keisha | 23 | 97.00 |
df['Name']
0 Robin 1 Karan 2 Priya 3 Varun 4 Keisha Name: Name, dtype: object
df.Name
0 Robin 1 Karan 2 Priya 3 Varun 4 Keisha Name: Name, dtype: object
df
| Name | Age | Marks1 | |
|---|---|---|---|
| 0 | Robin | 26 | 45.34 |
| 1 | Karan | 25 | 78.50 |
| 2 | Priya | 23 | 87.67 |
| 3 | Varun | 22 | 56.00 |
| 4 | Keisha | 23 | 97.00 |
df['Marks2'] = [78,56,98,45,66]
df['Roll No'] = [10,11,12,13,14]
df
| Name | Age | Marks1 | Marks2 | Roll No | |
|---|---|---|---|---|---|
| 0 | Robin | 26 | 45.34 | 78 | 10 |
| 1 | Karan | 25 | 78.50 | 56 | 11 |
| 2 | Priya | 23 | 87.67 | 98 | 12 |
| 3 | Varun | 22 | 56.00 | 45 | 13 |
| 4 | Keisha | 23 | 97.00 | 66 | 14 |
df['Total Marks']=df['Marks1']+df['Marks2']
df
| Name | Age | Marks1 | Marks2 | Roll No | Total Marks | |
|---|---|---|---|---|---|---|
| 0 | Robin | 26 | 45.34 | 78 | 10 | 123.34 |
| 1 | Karan | 25 | 78.50 | 56 | 11 | 134.50 |
| 2 | Priya | 23 | 87.67 | 98 | 12 | 185.67 |
| 3 | Varun | 22 | 56.00 | 45 | 13 | 101.00 |
| 4 | Keisha | 23 | 97.00 | 66 | 14 | 163.00 |
del df['Roll No']
df
| Name | Age | Marks1 | Marks2 | Total Marks | |
|---|---|---|---|---|---|
| 0 | Robin | 26 | 45.34 | 78 | 123.34 |
| 1 | Karan | 25 | 78.50 | 56 | 134.50 |
| 2 | Priya | 23 | 87.67 | 98 | 185.67 |
| 3 | Varun | 22 | 56.00 | 45 | 101.00 |
| 4 | Keisha | 23 | 97.00 | 66 | 163.00 |
df.pop('Age')
df
| Name | Marks1 | Marks2 | Total Marks | |
|---|---|---|---|---|
| 0 | Robin | 45.34 | 78 | 123.34 |
| 1 | Karan | 78.50 | 56 | 134.50 |
| 2 | Priya | 87.67 | 98 | 185.67 |
| 3 | Varun | 56.00 | 45 | 101.00 |
| 4 | Keisha | 97.00 | 66 | 163.00 |
df
| Name | Marks1 | Marks2 | Total Marks | |
|---|---|---|---|---|
| 0 | Robin | 45.34 | 78 | 123.34 |
| 1 | Karan | 78.50 | 56 | 134.50 |
| 2 | Priya | 87.67 | 98 | 185.67 |
| 3 | Varun | 56.00 | 45 | 101.00 |
| 4 | Keisha | 97.00 | 66 | 163.00 |
df = df.drop(0)
df
| Name | Marks1 | Marks2 | Total Marks | |
|---|---|---|---|---|
| 1 | Karan | 78.50 | 56 | 134.50 |
| 2 | Priya | 87.67 | 98 | 185.67 |
| 3 | Varun | 56.00 | 45 | 101.00 |
| 4 | Keisha | 97.00 | 66 | 163.00 |